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Abstract 

There are many Markov chains on infinite dimensional spaces whose one-step transition kernels are mutu- 
ally singular when starting from different initial conditions. We give results which prove unique ergodicity 
under minimal assumptions on one hand and the existence of a spectral gap under conditions reminiscent 
of Harris' theorem. 

The first uses the existence of couplings which draw the solutions together as time goes to infinity. Such 
"asymptotic couplings" were central to |EMS01 , Mat02b, Hai02, BM05 1 on which this work builds. As 
in |BM05 1 the emphasis here is on stochastic differential delay equations. 

Harris' celebrated theorem states that if a Markov chain admits a Lyapunov function whose level sets 
are "small" (in the sense that transition probabilities are uniformly bounded from below), then it admits 
a unique invariant measure and transition probabilities converge towards it at exponential speed. This 
convergence takes place in a total variation norm, weighted by the Lyapunov function. 

A second aim of this article is to replace the notion of a "small set" by the much weaker notion of a 
"d-small set," which takes the topology of the underlying space into account via a distance-like function 
d. With this notion at hand, we prove an analogue to Harris' theorem, where the convergence takes place 
in a Wasserstein-like distance weighted again by the Lyapunov function. 

This abstract result is then applied to the framework of stochastic delay equations. In this framework, 
the usual theory of Harris chains does not apply, since there are natural examples for which there exist 
no small sets (except for sets consisting of only one point). This gives a solution to the long-standing 
open problem of finding natural conditions under which a stochastic delay equation admits at most one 
invariant measure and transition probabilities converge to it. 

Keywords: Stochastic delay equation, invariant measure, Harris' theorem, weak convergence, spectral 
gap, asymptotic coupling. 
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1 Introduction 

There are many Markov chains on infinite dimensional spaces whose one-step transition kernels are 
mutually singular when starting from different initial conditions. Many standard techniques used in the 
study of Markov chains as exposed for example in [MT93 1 can not be applied to such a singular setting. 
In this article, we provide two sets of results which can be applied to general Markov processes even in 
such a singular settings. The first set of results gives minimal, verifiable conditions which are equivalent 
to the existence of at most one invariant measure. The second set of results gives a weak version of 
Harris' theorem which proves the existence of a spectral gap under the existence of a Lyapunov function 
and a modified "small set" condition. 

The study of the ergodic theory for stochastic partial differential equations (SPDEs) has been one of 
the principal motivations to develop this theory. While even simple, formally elliptic, linear SPDEs can 
have transition probabilities which are mutually singular, the bulk of recent work has been motivated 
by equations driven by noise which is "degenerate" to varying degrees MEH01I IBKL01I IEMS01I |KS02 
Mat02b, HM06 HM08a|. The current article focuses on stochastic delay differential equations (SD- 
DEs) and makes use of the techniques developed in the SPDE context. That the SPDE techniques are 
applicable to the SDDE setting is not surprising since [EMS01 1 reduced the original SPDE, the stochas- 
tic Navier-Stokes equations, to an SDDE to prove unique ergodicity. In [BM05 |, the same ideas were 
applied directly to SDDEs. There the emphasis was on additive noise, here we generalize the results to 
the setting of state dependent noise. The works BEMSO 1 1 lMat02bl IHai021 IBM05I all share the central 
idea of using a shift in the driving Wiener process to force solutions starting at different initial conditions 
together asymptotically as time goes to infinity. In [EMS01 , Mat02b, BM05|, the asymptotic coupling 
was achieved by driving as subset of the degrees of freedom together in finite time. Typically these were 
the dynamically unstable directions, which ensured the remaining degrees of freedom would converge 
to each other asymptotically. In [Hai02 HM06] the unstable directions were only stabilized sufficiently 
by shifting the driving Wiener processes to ensure that all of the degrees of freedom converged together 
asymptotically. This broadens the domain of applicability and is the tact taken in Section [2] to prove a 
very general theorem which gives verifiable conditions which are equivalent to unique ergodicity. In 
particular, this result applies to the setting when the transition probabilities are mutually singular for 
many initial conditions. 

A simple, instructive example which motivates our discussion is the following SDDE: 

dX(t) = -cX(t)dt + g(X(t-rJ)dW(t) , (1.1) 

where r > 0, W is a standard Wiener process, c > 0, and g : R — > R is a strictly positive, bounded 
and strictly increasing function. This can be viewed as a Markov process {X t }t>o on the space X = 
C([— r, 0], R) which possesses an invariant measure for sufficiently large c. However, in this particular 
case, given the solution X t for any t > 0, the initial condition X E X can be recovered with probability 
one, exploiting the law of the iterated logarithm for Brownian motion (see |Sch05 1, Section 2). Thus, if 
the initial conditions in C([—r, 0], R) do not agree, then the transition probabilities for any step of this 
chain are always mutually singular. In particular, the corresponding Markov semigroup does not have 
the strong Feller property and, even worse, the only "small sets" for this system are those consisting of 
one single point. The results in Section|2]nevertheless apply and allow us to show that ( |1.1) can have at 
most one invariant measure and that converges toward it happens at exponential rate. 

While the main application considered in this article is that of stochastic delay equations, the prin- 
cipal theorems are also applicable to a large class of stochastic PDEs driven by degenerate noise. In 
particular, Theorem 5.4 in |HM08c | yields a very large class of degenerate SPDEs (essentially semilin- 
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ear SPDEs with polynomial nonlinearities driven by additive noise, satisfying a Hormander condition) 



for which it is possible to find a contracting distance d, see Section 5.3 below. 
1.1 Overview of main results 

We now summarise the two principal results of this article. The first is an abstract ergodic theorem 
which is useful in a number of different settings and gives conditions equivalent to unique ergodicity. 
The second result gives a weak version of Harris' theorem which ensures the existence of a spectral gap 
if there exists an appropriate Lyapunov function. 



1.1.1 Asymptotic coupling and unique ergodicity 

Let X be a Polish space with metric d and let X°° = X N ° be the associated space of one-sided infinite 
sequences. Given a Markov transition kernel V on X, we will write T^oo] : X — > A4(X°°) as the prob- 
ability kernel defined by stepping with the Markov kernel V . Here _A/f(X°°) is the space of probability 
measures on X°°. If fi is a probability measure on X, then we write 7 5 [ o]M f° r tne measure in AiiX 00 ) 
defined by f x V[oo](x, -)ft(dx) . 

In general, we will denote by .M(Y) the set of probability measures over a Polish space Y. Given 
Hi, fi2 € ■M(Y), C(/j,i, 112) will denote the set of all couplings of the two measures. Namely, 

C(mi,M2) = {r € M(Y x Y) : Ilf T = fn for i= 1,2}, 

where n (l) is the projection defined by n w (yi, j/2) = V% and /#/! is the push-forward of the measure ft 
defined by (f#fi)(A) = fi(f~ 1 (A)). We define the diagonal at infinity 

V = \(x a \ x (2) ) £ X°° x X°° : lim d(x%\ a%>) = o) 

as the set of paths which converge to each other asymptotically. Given two measures m x and m,2 on 
X°°, we say that T £ C(m%, m^) is an asymptotic coupling of mi and m-i if T(D) = 1. 

It is reasonable to expect that if two invariant measure fi\ and fi2 are such that there exists an 
asymptotic coupling of T^oojMi and 7 3 [oo]M2 then in fact fi\ = /i2- We will see that on the infinite 
product structure a seemingly weaker notion is sufficient to prove [i\ = /12. 

To this end we define 

C(mi,M2) = {r S M(Y x Y) : Ilf T < ^ for * = 1, 2}, (1.2) 

where H# T <ti fii means that 11^ T is absolutely continuous with respect to fii. 

To state the main results of this section, we recall that an invariant measure /i for V is said to be 
ergodic if for any invariant ip : X — > R (P(p — tp), (p is /i-almost surely constant. 

Theorem 1.1 Let V be a Markov operator on a Polish space X admitting two ergodic invariant mea- 
sures fi\ and fi2- The following statements are equivalent: 

1. fl! = fl 2 - 

2. There exists an asymptotic coupling ofP[oo}fL\ and T^oo]/^- 

3. There exists T € CCP^/xi, ^[001/^2) such that T(T>) > 0. 
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Remark 1.2 In ( 1.2 1, we could have replaced absolute continuity by equivalence. If we have a 'cou- 
pling' r satisfying the current condition, the measure |(T + V[oo]l^i <8 T^oo]^) satisfies that stronger 
condition. 



Remark 1.3 At first, it might seem surprising that it is sufficient to have an asymptotically coupling 
measure which is only equivalent and not equal to the law of the Markov process. However, it is 
important to recall that equivalence on an infinite time horizon is a much stronger statement than on a 
finite one. The key observation is that the time average of any function along a typical infinite trajectory 
gives almost surely the value of the integral of the function against some invariant measure. This was 
the key fact used in related results in |EMS01 1 and will be central to the proof below. 

Remark 1.4 This theorem was formulated in discrete time for simplicity. Since it only concerns unique- 
ness of the invariant measure, this is not a restriction since one can apply it to a continuous time system 
simply by subsampling it at integer times. 



1.1.2 A weak version of Harris' Theorem 

We now turn to an extension of the usual Harris theorem on the exponential convergence of Harris 
chains under a Lyapunov condition. Recall that, given a Markov semigroup {Vt}t>o over a measurable 
space X, a measurable function V : X — * R + is called a Lyapunov function for Vt if there exist strictly 
positive constants Cy, 7, Ky such that 

VtVix) < Cve-^V(x) + K v , 

holds for every x E X and every t > 0. Another omnipresent notion in the theory of Harris chains is 
that of a small set. Recall that A C X is small for a Markov operator Vt if there exists 6 > such that 

\\V t (x,-)-V t (y,-)\\ T v<l-6, (1.3) 

holds for every x, y S (This is actually a slightly weaker notion of a small set than that found in 
BMT93I , but it turns out to be sufficient for the results stated in this article.) With these definitions at 
hand, Harris' theorem states that: 



Theorem 1.5 Let {Vt}t>a be a Markov semigroup over a measurable space X admitting a Lyapunov 
function V and a time > 7 _1 log Cy such that the level sets {x G X : V(x) < C} are small for 
Vt t for every C > 0. Then, there exists a unique probability measure /1* on X that is invariant for 
Vt- Furthermore, there exist constants C > and 7 > such that the transition probabilities Vt{x, ■ ) 
satisfy 

\\V t (x, ■ ) - A**Htv < C(l + V(x))e-V , 
for every t > and every i£l 

Remark 1.6 The convergence actually takes place in a stronger total variation norm weighted by V, in 
which the Markov semigroup then admits a spectral gap, see BMT931 HM08b|. 



In this article, we normalise the total variation distance in such a way that mutually singular probability measures are at 
distance 1 of each other. This differs by a factor 2 from the definition sometimes found in the literature. 
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While this result has been widely applied in the study of the long-time behaviour of Markov pro- 
cesses [MT93|, it does not seem to be very suitable for the study of infinite-dimensional evolution 
equations because the notion of small sets requires that the transition measure not be mutually singular 
for nearby points. 

This suggests that one should seek for a version of Harris' theorem that makes use of a relaxed 
notion of a small set, allowing for transition probabilities to be mutually singular. To this effect, we will 
introduce the notion of a d-small set for a given function d: X x X — > [0, 1] used to measure distances 



between transition probabilities. This will be the content of Definition 4.3 below. If we lift d to the 



space of probability measures in the same way that one defines Wasserstein-1 distances, then this notion 



is just ( 1.3 1 with the total variation distance replaced by d. 

However, we can of course not use any distance function d and expect to obtain a convergence result 
by combining it simply with Lyapunov stability. We therefore introduce the concept of a distance d that 
is contracting for Vt if there exists a < 1 such that 

d(Vt(x,-),Vt(y,-)) <ad(x,y), (1.4) 

holds for any two points x, y € X such that d(x, y) < 1. This seems to be a very stringent condition at 
first sight (one has the impression that ( |1.4| > alone is already sufficient to give exponential convergence of 
Vtii to p,+ when measured in the distance d), but it is very important to note that ( |1.4| i is not assumed to 
hold when d(x, y) = 1. Therefore, the interesting class of distance functions d will consist of functions 
that are equal to 1 for "most" pairs of points x and y. Compare this with the fact that the total variation 
distance can be viewed as the Wasserstein-1 distance corresponding to the trivial metric that is equal to 
1 for any two points that are not identical. 

With these definitions at hand, a slightly simplified version of our main abstract theorem states that: 

Theorem 1.7 Let {Vt}t>o be a Markov semigroup over a Polish space X that admits a Lyapunov 
function V . Assume furthermore that there exists t* > 7 log Cy and a lower semi-continuous metric 
rf:XxX->[0,l] such that 



d is contracting for V, 



• level sets ofV are d-small for Vt+. 

Then there exists a unique invariant measure /z* for Vt and the convergence d(V t (x, •X/O — > is 
exponential for every i£X. 

1.2 Structure of paper 

In Section|2] we give the proof of Theorem |l.l| as well as a result which under related hypotheses ensures 
that the transition probabilities starting from any point converge to the expected invariant measure. In 
Section|3] we apply the results of the preceding theorems to prove the unique ergodicity and convergence 
of transition probabilities for a wide class of SDDE, including those with state dependent coefficients. In 
Section|4] we prove a weak version of Harris' ergodic theorem which implies exponential convergence 
in a type of weighted Wasserstein-1 distance on measures and an associated spectral gap. In Section [5] 
we apply these results to the SDDE setting under the additional assumption of a Lyapunov function in 



order to obtain a spectral gap result. Lastly, in Section 5.3 we show how to apply the results to the 



SPDE setting, thus providing an alternative proof to the results in [HM08a|. 
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2 Asymptotic coupling 

This section contains the proof of Theorem |1.1| and a number of results which use related ideas. The 
goal throughout this section is to use minimal assumptions. We also provide a criterion that yields 
convergence rates toward the invariant measure using a coupling argument. However, in the case of 
exponential convergence, a somewhat cleaner statement will be provided by the weak version of Harris' 
theorem in Section [4] 



2.1 Proof of Theorem 1.1 Unique ergodicity through asymptotic coupling 

The proof given below abstracts the essence of the arguments given in [EMS01 , BM05 |. A version of 
this theorem is presented in [Mat07]. The basic idea is to leverage the fact that if the initial condition 
is distributed as an ergodic invariant measure, then Birkhoff's ergodic theorem implies that the average 
along a trajectory of a given function is an almost sure property of the trajectories. 



Proof of Theorem \1.1\ Throughout this proof we will use the shorthand notation m; = V[o6\^i for 
i = 1, 2. Clearly[T] implies|2] because if /Ltj = /12, then we can take the asymptotic coupling to be the 
measure mi (which then equals m 2 ) pushed onto the diagonal of X°° x X°°. Clearly[2] implies|3] since 
C^oojA^jT^ooj/^) c CCP[oo]Hi: "P[oo]M2) an d the definition of asymptotic coupling implies that there 
exists a r G C(V[oa]fii, 7 7 [oo]A t 2) with T(T>) = 1. We now turn to the meat of the result, proving that|3] 
implies [T] 

Defining the shift 9 : X°° — > X°° by (9x)k = Xk+i for k £ No, we observe that 9#rrii = rrii for 
i = 1,2. Or in other words m, is an invariant measure for the map 6. In addition, one sees that m, is an 
ergodic invariant measure for the shift since /Xj was ergodic for V. 

Fixing any bounded, globally Lipschitz ^:X^Rwe extend it to a bounded function (p : X°° — > R 
by setting <p(x) = <p(xo) for x £ X°°. Now Birkhoff's ergodic theorem and the fact that the m; are 
ergodic for 6 ensure the existence of sets Af C X°° with nii(Af) = 1 so that if x £ Af then 

^ n— 1 ^ n — 1 „ 

lim — 7 tp(xk) = lim — 7 (p{O k x) = I <p(x)rrii(dx) — / <p(z)(j,i(dz) . (2.1) 

k=0 k=0 X X 

Here, the first and last implications follow from the definition of (p. Now let T £ C(mi,rri2) with 
T(V) > 0. Since U^T <C Wj and both are probability measures we know that (UfT)(Af) = 
mi(Af) = 1 for i = 1,2. This in turn implies that T(X oc x A$) = T(Af x X 00 ) = 1, so that 
T(Af x A%) = 1. Hence if T> = V n (Af X A$), we have that T(V) > 0. In particular, this implies that 
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V is not empty. Observe that for any (x m , x {2) ) € T) we know that for i = 1, 2 

lim - V (^(x^) = / <^(z)^(dz) . 



fc=0 



On the other hand, since x (2) ) S I? C T>, we know that lim„ dCa;^-*, a;^) = 0. Combining these 
facts gives 



1 n-1 



fc=0 fc=0 
„ n-1 

< lim -V^'Vf^O, 

fe=0 



where both the first equality and the second inequality follow from the fact that we choose (x (1 \ x (2) ) in 
V. Therefore, 



ip(x)fii(dx) = J ip(x)fj l2 (dx) 
for any Lipschitz and bounded ip which implies that [i\ — [i 2 . □ 

Remark 2.1 Note that it is not true in general that the uniqueness of the invariant measure implies that 
there exist asymptotic couplings for any two starting points! See for example BLin921 ICWOOl for a 
discussion on the relation between coupling, shift coupling, ergodicity, and mixing. 

Corollary 2.2 Let V be a Markov operator on a Polish space. If there exists a measurable set A C X 
with the following two properties: 

• /i(A) > § for any invariant probability measure /i ofV, 

• there exists a measurable map A x A 3 (x, y) i— > T x-y S CCP[oo]<5ir, "P[oo]$y) such that T XtV {T>) > 
for every x,y G A. 

then V has at most one invariant probability measure. 

Remark 2.3 The measurability of the map T means that the map (x, y) i— > J <p dT xy is measurable for 
every bounded continuous function <p : X°° x X°° — > R. 

Proof of Corollary \2.2\ Assume that there are two invariant measures [i\ and p,^. By the ergodic decom- 
position, we can assume that jii and [i 2 are both ergodic since any invariant measure can be decomposed 
into ergodic invariant measures. We extend the definition of T to X x X by T x ,y — V[oo}8 x x V[oo]S y 
for (x, y) A x A. For measurable sets B C X°° x X°° define the measure V by 



T(B)= [ r atV (B)iti(dx)n2(dy) 

JXxX 



and notice that T G C(7'[ 00 ]/zi, T-^oo]/^) by construction. Furthermore by the assumption that T XtV (D) > 
for (x, y) G A x A, we see that T(T>) > 0. Hence Theorem |l . 1 | implies that fi% = fi 2 - □ 
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2.2 Convergence of transition probabilities 

In this section, we give a simple criterion for the convergence of transition probabilities towards an 
invariant measure under extremely weak conditions that essentially state that: 

1. There exists a point x a e X such that the process returns "often" to arbitrarily small neighbour- 
hoods of Xq. 

2. The coupling probability between trajectories starting at x and y converges to 1 as y — > x. 
More precisely, we have the following result: 

Theorem 2.4 Let V be a Markov kernel over a Polish space X with metric d that admits an ergodic 
invariant measure /i*. Assume that there exist BcX with Ht(B) > \ and Xq G X such that, for every 
neighbourhood U of xq there exists k > such that mfy e s'P k (y, U) > 0. Assume furthermore that 
there exists a measurable map 

r : X x X ^ M(X°° .X 00 ) 

(y,y') ^ Ty.y, , 

with the property that T y>y i <G C{V[oo]5 y , V[oo]S y ') for every (y, y 1 ) and such that, for every e > and 
every x e X, there exists a neighbourhood U of x such that mf y y i e u T y y /(D) > 1 — e. 
Then, V n (z, weakly as n — > oo for every z e supp /U*. 

Remark 2.5 This convergence result is valid even in situations where the invariant measure is not 
unique. If the process is irreducible however, then supp /i* = X and the convergence holds for ev- 
ery z <G X (implying in particular that ^ is unique). 

Proof. We denote by V e N the subset of X°° x X°° given by 

V £ N = {(X, Y) : d(X n , Y n ) <eWn>N}, 

so thatX> = rUolUx, 13 ^- 

Note first that by the ergodicity of yu* and Birkhoff's ergodic theorem, there exists a set A with 
H-k{A) = 1 and such that 

T 
t=0 

holds almost surely for every process X t with X € A. 

Fix now z e A and e > and let x be as in the first assumption of the statement. Fix furthermore 
a neighbourhood U of xo such that T y y >(D) > 1 — e for y, y' e U. Define the function (y, y') N y y > 
by 

N y , y , = inf{N > : T y y(V £ N ) > 1 - 2e} . (2.2) 

It follows from the choice of U and the fact that T> C Uw>o ^at one has N y , y / < oo for every pair 
y, y' € U. Let now y t be a Markov process generated by V with initial distribution fj,+ and let Z t be an 
independent process with initial condition z. Since one has the bound 1b(x)1b(z) > 1b(x) + 1b(z) — 1, 
it follows from the definition of A that 

1 T 

lim - V l B (Y t )l B (Zt) > 2^(B) - 1 > , (2.3) 
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almost surely. Let now t = inf{£ > : Y t £ U k Z t £ U}. Before we proceed, let us argue that r 
is almost surely finite. We know indeed by assumption that there exists Tjj > and a > such that 
V Tu (y, U) > a for every y £ B. Furthermore, setting To = and defining recursively by 

r fc = mf{t > r fc _! + : Y t £ B k Z t £ B} , 

we have the bound 

P(r > T) < P(r > T | r fe < T - T v )P(T k < T - T v ) + P(r k >T~T V ) 
< P(t ^ n + T v k ■ ■ ■ kr ? r k + T v ) + P(r k > T - Tu) 
<(l-a) k +P(T k >T-Tu), 

where the last inequality follows from the strong Markov property. Since we know from ( |2.3| l that the 
r k are almost surely finite, we can make P(j k > T — Tjj) arbitrarily small for fixed k by making T 
large. 

It follows that there exists Tq > with P(r < To) > 1 — e. Let now /iq be the law of the stopped 
process at time To, that is /io = Law(ljbAr, Zt at)- It follows from the definitions of To and r that 
Ho(U xU)>l-s. 

Since N Vty > is finite on U x U, we can find a sufficiently large value T\ > such that 

Mo({Q/, y')eUxU : N ViV , < TJ) > 1 - 2s . 

Let now T y . z be the coupling between "P[oo]/^* and V^jSz obtained by first running two independent 
copies (Y t , Z t ) up to the stopping time r and then running them with the coupling Vy T .z T - This is 
indeed a coupling by the strong Markov property (recall that we are in the discrete time case). Setting 
T = T) + Ti, it follows immediately from the construction that under the coupling T y , z , we have 
P(d(Y t , Z t ) < e) > 1 — 4e for each n > T, thus yielding the convergence of ■) to /x* as required. 

Let us now extend this argument to more general starting points and fix an arbitrary z G supp /i* and 
e > 0. Since A is dense in supp it follows from our assumption on T that there exists z' € X such 
that T ZiZ '(T>) > 1 — e. In particular, we can find a time T 2 such that T Z Z 'CD^ 2 ) > 1 — 2e. Since on the 
other hand, we know that V n (z' ' , ■ ) — > weakly, it immediately follows that we can find T3 > T2 and 
for each n > T 3 a coupling T between "P™(z, • ) and ^ such that T({(y, y') : d(y, y') < 2e}) > 1 — 3e, 
thus concluding the proof. □ 

2.3 More convergence results: rates and convergence for all initial conditions 

In this section, we continue to develop the ideas of the previous sections to show how to obtain the 
convergence of the transition probabilities for every initial condition and how to obtain a rate of con- 
vergence. We essentially follow the ideas laid out in |Mat02b|. They are sufficent to give exponential 
convergence, but not a convergence in any operator norm. Here we will not attempt to apply the results 
to our SDDE setting since Theorem | 1 ,5| provides stronger results in our setting of interest. Nonetheless, 
the ideas and imagery presented in this section is useful to build intuition. It is also a useful technique 
in situations where exponential convergence doesn't hold. 

Define the 1-Wasserstein distance for two probability measures /ij on X by 



^1(^1^2)= sup / f(x)^i(dx)- / f(x)^ 2 (dx), 
feu Pl J J 
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where Lip 1 are the Lipschitz functions / : X — > R with Lipschitz constant one. 

Let /Xi,/i2 be two probability measures on X and T G C(P[oo]Hi, ^[001^2) and let Z € X°° x X°°, 
with Z n — {Z^\ Z^), be the stochastic process on X x X with paths distributed as T. Let Q n be the 
er-algebra generated by Z n and LT„ be the projection on X°° x X°° defined by (LT„a;)fc = Xk+ n - 

We say that V G CCP[oo]Mi, ^[001/^2) is a marginally Markovian coupling if for any stopping time r 
(adapted to Q n ), the conditional distribution Law(II r Z | Q T ) of LT T Z given Cf r belongs almost surely to 
CCPioo]S z m ,P[oo]5 z <2)), where Z n = (Z^\ Z^). Notice that this is weaker than assuming that Z n is a 
Markov process. 

Let q : N — > (0, 00) be a strictly decreasing function with lim„ g(n) = 0. We define the "neighbor- 
hood" of the diagonal 

A g = {(x (1, ,x (2) ) G X°° x X°° : d(ar«,^ 2) ) < e(n)} . 
For any stochastic process ZonXxX with Z n = {Z^\ Z^), we define the stopping time 

r e (Z) = mf{n > 1 : d(Z%\ Z%>) > g(n)} (2.4) 
and for BeXxXwe define the hitting time 

(T B (Z) = inf{n > : Z n G B} . (2.5) 



Theorem 2.6 Consider a Markov operator V as before over a Polish space X with distance function 
d<l. 

Fix a strictly decreasing rate function q and a set B C X x X. Assume that there exists a measurable 
map zq 1— > r zo G C(7'[ 00 ]<5 z (i),7 : '[oo]<5 z <2)) where zq — (zq 1 ', Zq 2 ') G X x X, which is a marginally 

Markovian coupling such that, if Z with Z n = (Z!^\ Z®'), is distributed as T Z(I then the following 
assumptions hold: 

1. If Zq $l B, then gb(Z) is finite almost surely. 

2. There exists an a > so that if Zq £ B then r zo (A„) > a. 
Then for all (z^\ z™) G X x X, 

di(V n 6n),V n S„m)—*0 as n->0. 

Proof. To prove the result, we will construct a new coupling on X°° x X°° from the coupling V. We will 
do this by constructing a process on excursions from B. The state space of our process of excursions 
will be given by: 

00 

X = (J ((X k x X fc ) x B)) and X = (X°° x X°°) U X . 

fc=0 

In words, X is the space of finite (but arbitrary) length trajectories taking values in the product space 
X x X and ending in B. The space X furthermore contains trajectories in X x X of infinite length. 

To build the process on excursions, we begin by constructing a Markov transition kernel Q : B — > 
A4(X) from the T's in the following way. For any z G B, let Z' be a (X°° x X°°)-valued random variable 
distributed according to the measure T z and set t = T e {Z') with r e as in ( 2.4 ». If r = 00, then we set 
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Z = Z'. If r < oo and Z' r E B, then we set Z = {Z[, ■ ■ ■ , Z' T ). Otherwise, let Z" be the (X°° x X 00 )- 
valued random variable distributed according to the measure V z> ■ Since T is marginally Markovian and 
t is a stopping time, we know that the law ofH T Z' T is a coupling of 'P[ o]£z (1) > anc ^ ^[ooI^go almost- 
surely. Hence we can replace the trajectory of Z' after time r with the piece of trajectory Z" and the 
combined trajectory will still be a coupling starting from the initial data. Setting a = crg(Z") (which is 
finite almost surely) we define Zby Z = (Z[, ■ ■ ■ , Z' r , Z'{, ■ ■ ■ , Z'J.). 

In all cases, Z is either a trajectory of infinite length contained in A e or it is a segment of finite length 
ending in the set B. Hence Z is a X-valued random variable and we define Q z ( ■ ) to be the distribution of 
Z. To extend the definition to a kernel Q : X — > A4(K), we simply define Qz for z = (zi, • • • , Zfc) G X 
by Qz = Q2 fc since all trajectories in X terminate in B. 

We now construct our Markov process on X which we will denote by Z n — (Z!£\ZfP). We 
use Q which was constructed in the preceding paragraph as the Markov transition kernel. If ever the 
segment drawn is of infinite length, then the process simply stops. The only missing element is the 
initial condition. If zo E B then we take Zo = Zq. If zo ^ B, then we take Zq = (zq, Z[, ■ ■ ■ , Z' a ) 
where Z' is an X-valued random variable distributed according to T Zo and a = (Tb(Z'). 

Now we define l n to be the length of Z n . Let n* = inf{n : /„ = oo}. Since for all z E B, 
r z (A e ) > a > the P(n* > n) < (1 — a) n and thus n* is almost surely finite. Lastly we define 
t* = X^fc^o 1 Since n* , and the Ik are all almost surely finite, one sees that t* is almost surely finite. 

Finally, we are ready to perform the desired calculation. We will denote by Z t = (Z\ ,Z\ ) 
the trajectory in X°° x X°° obtained by concatenating together the segments produced by running the 
Markov chain Z n constructed in the preceding paragraph. For any / E Lip 1 (X) one has 

E/(Z t (1) ) - E/(Z ( (2) ) < Ed(Zl r \ Zl 2) )(l t « >t/2 + l t ,< t/2 ) 

< P(t* > t/2) + g(t/2) . (2.6) 

Observe that the right hand side is uniform for any / E Lip x . By assumption g(t/2) —> as t — > oo and 
since f* is almost surely finite P(<* > t/2) — > as t — > oo. □ 

The following corollary gives a rate of convergence assuming one can control a appropriate moment 
of t*. As in |Hai02, Mat02b], this is often done by assuming an appropriate Lyapunov structure. This 



result does not prove convergence in any operator norm. In this way, it is inferior to Theorem 1 .5 which 
we prefer. (The fact that the norm does not allow test functions / < V can be rectified with more work 
under additional assumptions.) 



Corollar y 2.7 In the setting of Theorem 2.6 let t*(z ) be the stopping time defined in the proof of 
Theorem 2.6 when starting from zq = (Zq, Zq 2 ') E X x X. If E(l/g(t*(zo))) < <&(z$) for some 
$:XxX^ (0,oo), then 

di(V t 5 z m,V t 5 z( ») < (1 + <f>(z )) Q (t/2) 

Proof. First observe that the Markov inequality implies that P(t* > t/2) = P(l/g(t*) > l/g(t/2)) < 
g(t / 2)E(1 / g(t*)) < g(t/2)Q(z ). Returning to ( |2.6| l, we see that this estimate completes the proof of 
the desired result. □ 

Remark 2.8 For this result to be useful, one needs control over E(l/ g(t*(zo))). First observe that 
since t* = ELo 1 l k and p («* > *0 < (1 - a) k the main difficulty is controlling the appropriate 
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moment of l n . l n consists of two parts. The first is r, the time to exit A e , and the second is a the 
time to return to B. The moments of r depend on how quickly T(A™) — I\A e ) goes to zero as a 

function of n where A£ = {(x (l \ x (2) ) G X°° x X°° : d(x%\ x ( fe 2) ) < g(k) for all k < n}. This gives 
information about how long it takes for Z to leave A g when it is conditioned not to stay in inside 
A g for all times. If this exit time has heavy tails, then it can retard the convergence rate. The return 
times to B also influences the convergence rate. Such a return time is often controlled by a Lyapunov 
function. In the context of obtaining exponential convergence, some of these points are explored in 
|Mat02b l Hai02|. Subexponential convergence rates via Lyapunov functions has been explored for 
Harris chains in MMT93I IVer99l IBCG08I IDFG091 . 



3 Application of the uniqueness and convergence criteria to SDDEs 
3.1 Application of the uniqueness criterion to SDDEs 

Fix r > and let C — C([— r, 0], R d ) denote the phase space of a general finite-dimensional delay 
equation with delay r endowed with the sup-norm || • ||. For a function or a process X defined on 
[t, — r,t] we write X t (s) := X(t + s), s G [— r, 0]. Consider the following stochastic functional 
differential equation: 

dX(t) = f(X t ) dt + g(X t ) dW(t), 

X = v eC, 

where / : C -> R d and g : C -> R m x R d and W t = (W t (1) , • • • , W[ m) ) is a standard Wiener process. 
We will provide conditions on / and g which ensure that, for every initial condition Xq = r\ G C, 



equation (3.1 1 has a unique pathwise solution which can then be viewed as a C-valued strong Markov 
process. The problem of existence and/or uniqueness of an invariant measure of such a process (or 
similar processes) has been addressed by a number of authors, see for example [IN64, Sch84, BM05 
IRRG06IIEGS091 . 

While existence of an invariant measure has been proven under natural sufficient conditions on the 
functionals / and g, the uniqueness question, as already mentioned in the introduction, has not been 
answered up to now even in such simple cases as 

f(x) = —cx(0) , g(x) — ip(x( — r)) , d = m = 1 , 

for some c > and tp a strictly positive, bounded and strictly increasing function |RRG06|. One 
difficulty is that the corresponding Markov process on C is not strong Feller. Even worse: given the 
solution X t for any t > 0, the initial condition Xq can be recovered with probability one BSch05l . 
Another peculiarity of such equations is that while they do in general generate a Feller semigroup 
on C, they often do not admit a modification which depends continuously on the initial condition - 
even if g is linear (see e.g. [Moh86| and [MS97]), so that they do not generate a stochastic flow of 
homeomorphisms. We do nevertheless have the following uniqueness result: 

Theorem 3.1 Assume that m > d > and that, for every r\ £ C, g(rf) admits a right inverse 
g^ 1 (f]) : R d — > R m . If f is continuous and bounded on bounded subsets of C, and for some K > 1, / 
and g satisfy 



SU P 11.9 < 00 . 



and 



2</(x) - f(y), x(0) - y(0))+ + \\g(x) - g(y)\\ 2 < K\\x - y" 2 
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where |||M||| 2 = Tr(MM*), then the equation |TT) has at most one invariant measure. 

Remark 3.2 This assumption is sufficient to ensure the existence of (unique) global solutions (see 
|RS08 1 for even weaker hypotheses). 



Remark 3.3 Notice that Theorem 3.1 does not ensure that there exists an invariant measure only that 



there exists at most one. Even when there is no invariant (probability) measure, the ideas in Theorem 3.1 



can be used to show that solutions with nearby initial conditions behave asymptotically the same with a 
positive probability. 

We will use the following two lemmas which are proven at the end of the section. The first one is 
similar to Proposition 7.3 in JpZ92|. 

Lemma 3.4 Let W(t), t > be a standard Wiener process and fix T > and p > 2. There exists a 
function g : [0, oo) — > [0, oo) satisfying lirn^oo g(X) — such that the following holds: let Y satisfy 
the equation 

dY(t) = -XY(t) dt + h(t) dW(t), (3.2) 
Y(0) = 0, 

where h is an adapted process with almost surely cadlag sample paths. Then for any stopping time r 
we have 

E( sup \Y(t)\p) < e(A)E( sup \h(t)\p). 

v 0<KtAT ' v 0<t<rAT ' 

Lemma 3.5 Let A > 0, consider the coupled set of equations: 

dX(t) = f(X t ) dt + g(X t ) dW(t) , X = V , 

dX(t) = f(X t )dt + \(X(t)-X(t))dt + g(X t )dW(t), X = v, 

and define Z(t) := X(t) — X(t), t > —r. Then, for every 70 > there exist A > and C > such 
that the bound E(sup (>0 e 7ot ||Z t ||) 8 < (C||Zo||) 8 holds for any pair of initial conditions Xq and Xq. 



Proof of Theorem \3.1\ We begin by fixing two initial conditions 77,77 S C of (3.3i. We furthermore 
define the "Girsanov shift" v by 

v(t) = \g(X t )-\X(t) - X(t)) , 



where A > is chosen as in Lemma 3.5 for 70 = 1 (say) and we set t = inf{< > : J Q \v(s)\ 2 ds > 
e _1 ||?7 — 77H 2 }, where e > is a small constant to be determined. Thanks to the non-degeneracy 
assumption on g and Lemma 3.5 we obtain lim £ ^ I > {' r = °°} = 1 an d ^ m t^ao \X(t) — X(t)\ = 
almost surely. In particular, there exist some e > independent of the initial conditions such that 
P{r = 00} > 0. We will fix such a value of e from now on. 

Setting W(t) = Wit) + J v(s)ds, we observe that the Cameron-Martin-Girsanov Theorem im- 
plies that there exists a measure Q on il := C([0, 00), R m ) so that under Q, W is a standard Wiener 
process on the time interval [0, 00). Let X be the solution of 

dX(t) = f(X t ) dt + g(X t ) dW(t) X = v- 
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Since J^v 2 (s)ds <e 1 \\il^v\\ 2 by construction, the law of X is equivalent on C([0, oo), R d ) to the law 
of a solution to ( |3.1[ ) with initial condition rj. This means that the law of the pair (X, X) has marginals 
which jire equivalent on C([0, oo), R d ) to solutions to ( |3.1| i starting respectively from r\ and rj. Since 
X = X on {t = 00}, we have 

lim |Z(i) - = lim \X(t) - X(t)\ = on {t = 00} a.s. 

£ — >oo t — >oo 

Therefore Corollary [372] implies that the discrete-time chain (X rn ) n€ ^ has at most one invariant prob- 
ability measure and hence the same is true for the Markov process (X t )t>o- Since il) endowed 
with the topology of weak convergence is a Polish space [ Vil03 1 and since all of the constants appearing 
in our explicit construction can be chosen independently of the initial conditions, the map (x, y) 1— > T x y 
is indeed measurable. □ 



We now give the proof of Lemma |3~4| which was given at the start of the section. 

Proof of Lemma \3A\ We begin by noticing that we need only prove the theorem for the supremum over 
a deterministic time interval [0, T], The version over the random time interval follows by considering 
the function h(s) — h(s)l[Q >T ^T)(s). Observe that h(s) again almost surely has cadlag paths and if Y(t) 
is the solution to j3.2\ with h replaced by h then 

sup \Y(s)\ p = sup \Y(s)\ p and sup | h(s) \ p = sup \h(s)\ p . 

KtAT t<T t<rf\T t<T 



The second identity is clear from the definition of h. The first follows from the observation that Y(s) — 
Y(s) for s < t A T and, for s > r A T, \Y{s)\ only decreases since h is identically zero. Hence it is 
enough to prove the lemma over a deterministic time interval. 

We begin by observing that the solution Y can be represented in the form 



Y(t) 



-XI 



e Xs h(s)dW(s) 



(3.4) 



Therefore, using Burkholder's inequality and abbreviating h* 



E|y(t)| 



p - e~ xtp E 



< C p e~ xtp E 



e As h(s)dW(s) 
t 

e 2Xs h 2 (s) ds 





sup 0<t<T \h(t)\, we obtain 
v 



p/2 



< C p E(h*) p (2X)~ p / 2 . 
Let N € N and define t k := t k (N) := kT/N for k = 0, N and 



(3.5) 



hit) 



h(s)dW(s), t k <t< t k+1 , k = 0, N-l. 



Notice that ifc(f) is a local martingale with respect to the filtration it generates. Integrating ( 3.4 1 by parts, 
we get 



Y(t) = Y(t k )e- X(t - t *> + I k (t)-\ 



- X{t ^I k (s)ds, 



fe+1- 
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Hence, 



sup \Y(t)\ p = max sup \Y(t)\ p 

0<t<T k =°>--i N - 1 t k <t<t k+1 

< 2P- 1 max [\Y(t k )\P + 2 sup \I k (t)\ p 
fe=o,..,iV-i I t k <t<t k+1 



Using Burkholder's inequality and (3.5 1, we get 

E sup \Y(t)\ p < 2 p - 1 NC p E(h*)P(2X)- p/2 + 2 p N max E sup \I k (t)\ p 
0<t<T ' fc=0,...,iV-l tk <t<t k + 1 



< 



2 p - 1 C v V(h*) p (N(2\y p/2 + 2T p/2 N 1 - 



For each e > 0, we can choose N large enough such that the coefficient of E(h*) p becomes smaller than 
s for all sufficiently large A. □ 

We conclude with the proof of Lemma [33] 



Proof of Lemma \375\ First observe that the pair of equations ( |3.3| > admits a unique global solution (see 
[RS08]). Setting Z(t) = X(t) - X(t), we see that 

d\Z(t)\ 2 = 2(f(X t ) - f(X t ), Z(t)) dt + \\g(X t ) - g(X t )\\ 2 dt - 2X\Z(t)\ 2 dt + dM(t) 
< K\\Z t \\ 2 dt - 2X\Z(t)\ 2 dt + dM(t) 

where M(0) = and dM(t) = 2(Z(t), (g(X t ) ~ g(X t )) dW (t)) . Define now Y(t) = e at \Z(t)\ 2 for a 
constant a to de determined later. Then 

dY(t) = aY(t)dt+ e at d\Z(t)\ 2 

< (a-2\)Y(t)dt + Ke at \\Z t \\ 2 dt+ e Qt dM(t) 

< (a-2X)Y(t)dt + Ke ar \\Y t \\dt+ e at dM(i) . 

Setting N(t) = L e _A(t_s) e QS dM(s) and k = 2X — a, the variation of constants formula thus yields 

Y(t) < e- Kt Y(0) + Ke ar [ e- K( '- s) ||r s || ds + N{t) 

Jo 

Ke ar 

<e- Kt F(0)+ sup \\Y S \\+N(t). 

K s6[0,t] 

For £ > 0, let now r e be the stopping time defined by t £ = 2r A inf{i > : \\Y t \\ > e -1 }. It follows 
that there exists a constant K independent of a, A and e such that 

Ke 4ar 

E sup ||Ks|| 4 < A'llFolj 4 H — — E sup ||r s || 4 + XE sup |A/(s)| 4 . 

se[o,T e ] « se[o,r e ] se[o,T e ] 



Now observe that by Lemma 3.4 we have for TV the bound 

E sup |A/( S )| 4 <£(A)2 4 E sup (e ias \Z(s)\ 4 \\g(X s ) - g(X s )f) < Ce 2ar g(X)E sup ||y s || 4 , 

sG[0,r £ ] s£[Q,t e ] sS[0,T e ] 
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for a constant C independent of a and A. This shows that we can find a function a i— ► A(a) such that 
both Ke iar /(A(a) - af < \ and KCe ar g(A(a)) < \, thus obtaining 



E sup ||n|| 4 < ^||Y || 4 +^E sup ||y s ||\ 
se[0,r E ] ^ «e[0,T e ] 

provided that we choose A = A(a). Since this bound is independent of e > 0, we can take the limit 
e — > 0, so that the monotone convergence theorem yields Esup s£[0 2r] ||Ys|| 4 < 2A > ||Yb|| 4 - In terms of 
our original process Z, we conclude that 

E||Z,.|| 8 < 2K\\Z f and E||Z 2r .|| 8 < 2A > e~ 4ttr ||^o|| 8 . (3.6) 

Since K is independent of a, we can ensure that 2Ke" Aar < e~ 19r7Q by taking a (and therefore also 
A) sufficiently large. Iterating (3.6i, we obtain 

E||Z 2m .|| 8 < e- 18 ^"||Z || 8 and E||Z (2n+1)r || 8 < 2K e - iar ^ n \\Z \\ s . (3.7) 

Note now that if t € [nr, (n + l)r], then ||Z f || < ||^ nr || + ||^(„+i) r |j. Therefore, there exists a constant 
C > such that 

oo 



supe 8 T°*||Z t || 8 < C Ve 8 i ,orn ||Z rn || 8 
*>° „=o 

Hence using (|3.7|i, we have for a different constant C > 



DC 



Esupe 87ot ||Z t || 8 < C||Z || 8 Ve" 70 ™. 

*>° n=0 

Since the sum on the right hand side converges, the proof is complete. □ 

This shows the uniqueness of the invariant measure for a large class of stochastic delay equations. 
It turns out that under exactly the same conditions, we can ensure that the invariant measure is not only 
unique, but that transition probabilities converge to it. 

3.2 Convergence of transition probabilities of SDDEs 

In this section we will apply the abstract results of Section 2.2 to the C-valued Markov process (X t ) 
which we introduced in the previous section. We will denote its transition probabilities by Vtirj, ■)■ We 
will prove the following result: 



Theorem 3.6 Let the assumptions of Theorem \3.1\ be satisfied. If the Markov process X t , t > admits 
an invariant probability measure fi, then for each i] £ C we have Vtii], •)—>// weakly. 

We start with the following lemma (which we will also need in Section[5]): 



Lemma 3.7 Let the assumptions of Theorem \3.1\ be satisfied and denote by "Br the closed ball in C with 
radius R and center 0. Then for each R, S > and each t* > 2r, 

inf V u {rj,B s ) >0. 
\\v\\<R 
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Proof. The proof resembles that of Lemma 2.4 in [SS02|. Fix R > 6 > 0. For each y € R d , \y\ < 
3R/2, let h = h y : [0, i+] — > R d be continuously differentiable with Lipschitz constant at most 2i?/r 
and satisfy ft = on [r, t*], ft(0) = y. Define 

£>(*):= |X(t)-ft(t)| 2 -(V2) 2 , 



where X solves SDDE ( |3~Tj ) with initial condition r] e C, \\r}\\ < R, y := 7/(0) - (5/2,0, ...,0) T and 
h = h y is defined as above. Then 

dD(t) = 2(X(t) - h{t), f(X t ) - h'(t)) dt + 2(X(t) - h(t),g(X t )dW(t)) +^rf j (Xddt, 



while D(0) = 0. Let r := inf{i > : |D(t)| > (S/i) 2 }. Let now PF X be a Wiener process that is 
independent of W and set 

Y(t) := D(t A r) + (W^t) ~ W^t))!^ . 



This is a semimartingale with y(0) = which fulfills the conditions of Lemma 1.8.3 of [Bas98| (with 
(<5/4) 2 in place of e). Therefore, there exists p > such that for all \\r]\\ < R we have 



Pt*fa,B 4 )>P( sup |y(i)| <(5/4:) 2 \X Q = f 1 ) >p, 
o<t<t t 



thus concluding the proof. 



□ 



Proof of Theorem \3. 6\ Theorem 2.4 is formulated for discrete time, so we first show that the two condi- 
tions are satisfied for the Markov kernel Vt for some t > 0. The previous lemma immediately implies 
that the first condition of Theorem 2.4 is satisfied for any t > and any sufficiently large k. The 
second condition follows from the fact that there exists (a small value) 5 > such that the metric 



d(x, y) := 1 A 8 \\x — y\\ on C is contracting for Vt (see Definition 4.5 1 for any sufficiently large t > 



(which is proved in Section 5.1 1 and Proposition 4.13 Since the support of fj, equals C, it therefore 



follows from Theorem 2.4 that for some suitable t, all transition probabilities of the chain associated to 



Vt converge to /i weakly. To show that even all transition probabilities of the continuous-time Markov 



process (X t ) converge to /.i weakly, it suffices to observe that (by Proposition 5.3 i there exists a constant 
C such that d(V T v, V r v) < C d{v, v) for all r g [0, t] and all v, v. □ 

Remark 3.8 It is not true in general that one has exponential convergence under the assumptions of The- 
orem |3.l| (plus the existence of an invariant measure) alone. Consider for example the one-dimensional 
SDE 



dx = — - 



dt + dW(t) . 



(l + x 2 ) a 

then it is known that for a € (|, 1) it has a unique invariant measure, but that convergence of transition 
probabilities is only stretched exponential |Hai09|. However, it does satisfy the one-sided Lipschitz 
condition of Theorem|3.1| 



4 A weak form of Harris' theorem 



In this section, we show that under very mild additional assumptions on the dynamic of (3.1 1, the 
uniqueness result for an invariant measure obtained in the previous section can be strengthened to an 
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exponential convergence result in a type of weighted Wasserstein distance. Our main ingredient will 
be the existence of a Lyapunov function for our system. Recall that a Lyapunov function for a Markov 
semigroup {Vt]t>o over a Polish space X is a function V : X — ► [0, oo] such that V is integrable with 
respect to Vt(x, ■ ) for every x G X and t > and such that there exist constants Cy, 7, Ky > such 
that the bound 

J V(y) T t (x, dy) < Cy er*V(x) + Ky , (4. 1) 

holds for every x G X and t > 0. In the usual theory of stability for Markov processes, the notion of a 
"small set" plays an equally important role. We say that a set A C X is small if there exists a time t > 
and a constant e > such that 



\\V t (x,-)-V t (y,-)\\ T v<l-e, (4.2) 

for every x,y € A. Recall that the total variation distance between two probability measures is equal 
to 1 if and only if the two measures are mutually singular. A set is therefore small if the transition 
probabilities starting from any two points in the set have a "common part" of mass at least e. The 
classical Harris theorem EMT93llHM08bl then states that: 



Theorem 4.1 (Harris) Let Vt be a Markov semigroup over a Polish space X such that there exists a 
Lyapunov function V with the additional property that the level sets {x : V(x) < C} are small for 
every C > 0. Then, Vt lias a unique invariant measure /x* and \[Pt(x, ■ ) — /^*||tv < C e _7 **(l + VXa:)) 
for some positive constants C and 7*. 

The proof of Harris' theorem is based on the fact that a semigroup satisfying these assumptions has 
a spectral gap in a modified total variation distance, where the variation is weighted by the Lyapunov 
function V . This theorem can clearly not be applied to Markov semigroups generated by stochastic 
delay equations in general. As already mentioned earlier, it is indeed known that even in simple cases 
where the diffusion coefficient g only depends on the past of the process, the initial condition can be 
recovered exactly from the solution at any subsequent time. This implies that in such a case 

\\Vt(x,-)-V t {y,-)hv = l 



for every x 7^ y and every t > 0, so that (4.2 1 fails. We would therefore like to replace the notion of 
a small set ( |4.2| i by a notion of "closedness" between transition probabilities that reflects the topology 
of the underlying space X. Before we state our modified notion of a d-small set, we introduce another 
notation: given a positive function d:XxX-> R + , we extend it to a positive function d: A^i(X) x 
.Mi(X) — * R + , where A^i(X) stands for the set of (Borel) probablity measures on X, by 



d(p, v)= inf / d(x, y) Tt(dx, dy) . (4.3) 

ttEC(h,v) 

If d is a metric, then its extension to A^i(X) is simply the corresponding Wasserstein-1 distance. In this 
section, we will be considering functions d: X x X — > R + that are not necessarily metrics but that are 
"distance-like" in the following sense: 

Definition 4.2 Given a Polish space X, a function d:XxX^ R + is distance-like if it is symmetric, 
lower semi-continuous, and such that d(x 1 y) = x — y. 
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Even though we think of d as being a kind of metric, it need not satisfy the triangle inequality. 
However, when lifted to the space of probability measures, d provides a reasonable way of measuring 
distances between measures in the sense that d(fi, v) > and v) = fi = v, the latter property 
being a consequence of the lower semi-continuity of d. The lower semicontinuity of d also ensures that 



the infimum in (4.3 i is always reached by some coupling tt. With this notation at hand, we set: 



Definition 4.3 Let "P be a Markov operator over a Polish space X endowed with a distance-like function 
d : X x X — > [0, 1]. A set A C X is said to be d-small if there exists e > such that 

d(V(x,-),V(y,-))<l-e, (4.4) 

for every x, y G A. 

Remark 4.4 If d(x, y) = djv(x, y) := l x ^j/> then the notion of a tf-small set coincides with the notion 
of a small set given in the introduction, since — v\\tv = djy(fi, v). 

In general, it is clear that having a Lyapunov function V with d-small level sets cannot be sufficient to 
imply the unique ergodicity of a Markov semigroup. A simple example is given by the Glauber dynamic 
of the 2D Ising model which exhibits two distinct ergodic invariant measures at low temperatures, but 
for which every set is ei-small if d is a distance function that metrises the product topology on the state 
space {0, 1} Z , for example d(cr, a') = Y.kai? Wi \° k ~ a 'k\- 

This shows that if we wish to make use of the notion of a rf-small set, we should impose additional 
assumptions on the function d. One feature that distinguishes the total variation distance g?tv among 
other distance-like functions is that, for any Markov operator V, one always has the contraction property 

djvCPft, Vv) < djy(fi, v) . 

It is therefore natural to look for distance-like functions with a similar property. This motivates the 
following definition: 

Definition 4.5 Let V be a Markov operator over a Polish space X endowed with a distance-like function 
d: X x X — ► [0,1]. The function d is said to be contracting for V if there exists a < 1 such that the 
bound 

d{V(x,-),V(y,-))<ad(x,y) (4.5) 
holds for every pair x, y € X with d(x, y) < 1. 

Remark 4.6 The assumption that d takes values in [0, 1] is not a restriction at all. One can indeed check 
that if an unbounded function d is contracting for a Markov operator V and A is a d-small set, then the 
same statements are true for d replaced by d A 1. 



It may seem at first sight that (4.5 i alone is already sufficient to guarantee the convergence of tran- 
sition probabilities toward a unique invariant measure. A little more thought shows that this is not the 
case, since the total variation distance djv is contracting for every Markov semigroup. The point here is 



that (4.5 i says nothing about the pairs (x, y) with d(x, y) = 1, and this set may be very large. However, 
combined with the existence of a Lyapunov function V that has d-small level sets, it turns out that this 
contraction property is sufficient not only for the existence and uniqueness of the invariant measure /j,*, 
but even for having exponential convergence of transition probabilities to /j* in a type of Wasserstein 
distance: 
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Theorem 4.7 Let Vt be a Markov semigroup over a Polish space X admitting a continuous Lyapunov 
function V. Suppose furthermore that there exists t* > and a distance-like function d: X x X — > [0,1] 
which is contracting for Vt t and such that the level set {x G X : V(x) < AK\?} is d- small for Vt t - 
(Here Ky is as in \4.1\ .) 

Then, Vt can have at most one invariant probability measure p+. Furthermore, defining d(x, y) = 
\/d(x, y)(l + V(x) + V(yJ), there exists t > such that 

d(.V t p,V t v)< -d{^v), (4.6) 
for all pairs of probability measures p and v on X. 



Remark 4.8 In the special case d = djy, we simply recover Harris' theorem, as stated for example in 
|HM08b|, so that this is a genuinely stronger statement. It is in this sense that Theorem 4.7 is a "weak" 



version of Harris' theorem where the notion of a "small set" has been replaced by the notion of a d-small 
set for a contracting distance-like function d. The only small difference is that Harris' theorem tells us 
that the Markov semigroup Vt exhibits a spectral gap in a total variation norm weighted by 1 + V, 
whereas we obtain a spectral gap in a total variation norm weighted by 1 + \/V . This is because the 
proof of Harris' theorem does not require the "close to each other" step (since if d(x, y) < 1, one has 
x — y and the estimate is trivial), so that we never need to apply the Cauchy-Schwarz inequality. 

Proof of Theorem \4. 7| Before we start the proof itself, we note that we can assume without loss of 
generality that t* > log(8Cy)/7, so that 

V u V<\v + K v . (4.7) 
o 

This is a simple consequence of the following two facts that can be checked in a straightforward way 
from the definitions: 



If d is contracting for two Markov operators V and Q, then it is also contracting for the product 



VQ. (Actually it is sufficient for d to be contracting for V and to have (4.5 1 with a = 1 for Q.) 

• If a set A is d-small for Q and d is contracting for V, then A is also d-small for VQ. 

Note also that the function d: A^i(X) x A^i(X) — > R + is convex in each of its arguments, so that the 
bound 

d(V t p,V t v)< ( d(Vt(x,-),V t (y,-))Mdx,dy) , 

JXxX 



is valid for any coupling tt 6 C(p, v). As a consequence, in order to show ( |4.6| >, it is sufficient to show 
that it holds in the particular case where /j, and v are Dirac measures. In other words, it is sufficient to 
show that there exists t > and a' < 1 such that 

d(Vt(x,-),V t (y,-))<a'd(x,y), (4.8) 



for every x,y £ X. Note also that (4.6 1 is sufficient to conclude that Vt can have at most one invariant 
measure by the following argument. Since V is a Lyapunov function for Vt, it is integrable with respect 
to any invariant measure s o tha t, if p and v are any two such measures, one has d(/i, v) < oo. It then 



follows immediately from (4.6 1 and from the invariance of /i, v, that d(p, v) = 0. It follows from the 



lower semicontinuity of d that p = v as required. 
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In order to show that ( |4.8| > holds, we make use of a trick similar to the one used in |HM08a|. For 
(3 > a (small) parameter to be determined later, we set 



dp(x, y) = ^Jd(x,y)(l + f3V(x) + PV(y)) 



Note that, because of the positivity of V, there exist constants c and C ( depe nding on (3 of course) such 
that cd(x 1 y) < dp(x, y) < Cd(x, y). As a consequence, if we can show (4.8 1 for dp with some value of 
the parameter (3, then it also holds for d by possibly considering a larger time t. Just as in |HM08a|, we 
now proceed by showing that (3 can be tuned in such a way that (4.8 1 holds, whether x and y are "close 
to each other," "far from the origin" or "close to the origin." 
Close to each other. This is the situation where d(x, y) < 1, so that 

d}(x, y) = d(x, y)(l + /3V(x) + PV(y)) . 

In this situation, we make use of the contractivity of d, the fact that V is a Lyapunov function, and the 
Cauchy-Schwarz inequality to obtain 

(dp(PtAx,-),P t M-y)) 2 < inf / d(x',y')n(dx',dy') J (1 + f3V(x') + pV(y'))7r(dx' , dy') 
< ad(x, y)(l + l(V(x) + V(y)) + 2(3K V ) , 

where the infimum runs over all it £ C(Pt* (x, ■ ), Vt„ (y, ■ ))■ For any given ct\ € (a, 1), we can further- 
more choose (3 sufficiently small such that a(l + 2(3Ky) < oc\, so that 

(dp{V t Ax,-),V u (y,-))f < aid}(x,y) . 
Far from the origin. This is the situation where d(x, y) > 1 and V(x) + V(y) > ^K v , so that 

d}(x, y) = 1 + (3{V{x) + V(y)) > 1 + 3f3K v + ^(V(x) + V(y)) . 
Using the Lyapunov structure (|4. 1|>, we thus get 



(dpiVtAx, ■ ), VtM • ))) 2 < 1 + 2f3K v + C v (3e- 7t *(V(x) + V(y)) < 1 + 2f3K v + ^(V(x) + V(y)) 



maX l 1 + Z(3K V ' 2) & ~' a2d ^ x ^ ' 



where we made use again of (4.7 1. While a,2 depends on the choice of (3, we see that for any fixed 

j3 > 0, one has 012 < 1. 

Close to the origin. This is the final situation where d(x,y) — 1 and V(x) + V(y) < iKy, so that 
d(x, y) > 1. In this case, we make use of the fact that the level set {x : V(x) < iKy} is assumed to 
be small to conclude that there exists a coupling tt for T>tS x i ' ) an d T^tSVi ' ) an< ^ a const ant e > such 
that J ddn < 1 - e, so that 

(d (P ti (x,-),P tt (y,-))) 2 < J d(x',y'Mdx',dy') J (1 + f3V(x') + pV(y'))n(dx' , Ay') 

< (1 - e)(l + 2(3K V + 2(3C V e"^*) < (1 - e)(l + A(3K v )d(x, y) , 

where we made again use of ( |4.7| >. Here, e > is independent of (3. Therefore, choosing (3 sufficiently 
small (for example j3 = e/(4Kv)), we can again make sure that the constant appearing in this expression 
is strictly smaller than 1, thus concluding the proof of Theorem 4.7 □ 



A WEAK FORM OF HARRIS' THEOREM 



22 



Remark 4.9 If the assumptions of the theorem hold uniformly for belonging to an open interval of 
times, then one can check that Theorem 4.7 implies that there exists r > and to > such that the 
bound 

d{V t p,V t v) < e- rt d(jj,,is) , 
holds for all £ > to, instead of multiples of only. 

If d is somewhat comparable to a metric, it turns out that we can even infer the existence of an 
invariant measure from the assumptions of Theorem |4.7| just like in the case of Harris' theorem: 

Corollary 4.10 If there exists a complete metric do on X such that do < yd and such that Vt is Feller 
on X, then under the assumptions of Theorem \4.7\ there exists a unique invariant measure p+for Vt- 

Proof. It only remains to show that an invariant measure exists for Vt- Fix an arbitrary prob ability 
measure /ionX such that J V d[i < oo and let t be the time obtained from Theorem 



4.7 



Since 



d > \fd > do by assumption and since d(p, Vtp) < oo by (4.1 1, it then follows from (4.8 1 that 

do(V n tP,V( n+ i)tp) < diVntV, Vtn+iytt*) < ^^r*^ ' 

so that the sequence {V n ttt}n>o is Cauchy in the space of probability measures on X endowed with the 
Wasserstein-1 distance associated to do- Since this space is complete MVil03L there exists p^ such that 
V n tp — > (J-oo weakly. In particular, the Feller property of Vt implies VtPoo — Moo so that, defining /i* 
by 



i r* 

p*(A) = 7 / 0P s Moo)(4)ds, 
t Jo 



one can check that V r ^ = p* for every r > as required. □ 



One standard way of using a "spectral gap" result like Theorem 4.7 is to obtain the stability of 
the invariant measure with respect to small perturbations of the dynamic. Assume for the sake of the 
argument that d satisfies the triangle inequality (in general it doesn't; see below) and that we have a 
sequence of "approximating semigroups" Vf such that the bound 

d{V5(x,-),V t (x,-))<sC{t)V{x), 

holds, where C is a function that is bounded on bounded subsets of R and V is some positive function 
V: X^R+. 

Let now p* denote the invariant measure for Vt and pi an invariant measure for Vf (which need not 
be unique). Choosing t as in (4.6 1, one then has the bound 

pi) = d(V t p*,Vfpl) < d(V t p„ Vtpl) + d(Vtpl,V E t pl) 
< ~d(p*, pi) + eC(t) jf V(x) p £ Jdx) , 

from which we deduce that d(p*, pi) < 2eC(t) J x V(x) pl(dx). If one can obtain an a priori bound on 
pi that ensures that J x V(x) /ij(dx) is bounded independently of e, this shows that the distance between 
the invariant measures for Vt and Vf is comparable to the distance between the transition probabilities 
for some fixed time t. 
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This argument is still valid if the distance function d satisfies a weak form of the triangle inequality, 
i.e. if there exists a positive constant K > such that 

d(x, y) < K(d(x, z) + d(z, y)) , (4.9) 

for every x, y, z S X. This turns out to be often satisfied in practice, due to the following result: 

Lemma 4.11 Let d: XxX-t [0,1] be a distance-like function and assume that there exists a constant 
K d such that 

d{x, y) < K d {d(x, z) + d(z, y)) , (4. 10) 

holds for every x,y, z £ X. Assume furthermore that V : X — > R + is such that there exist constants c, 
C such that the implication 



d(x, z) < c 



V(x) < CV(z) 



(4.11) 



4.7 



holds. Then, there exists a constant K such that (4.9 \ holds for d defined as in Theorem 
Proof. Note first that it is sufficient to show that there exists a constant K such that 

d{x, y)(l + V(x) + V(y)) < K(d(x, z){\ + V(x) + V(z)) + d(z, y)(l + Viz) + V(y))) . (4.12) 

Since d is symmetric, we can assume without loss of generality that V(x) > V(y). We consider the 
following two cases 

If d(x, z) > c, then the boundedness of d implies the existence of a constant C such that 

G 2C 
d(x,y)(l+V(x)+V(y)) < C(l+V(x)+V(y)) < -d(x, z)(l+V(x)+V(y)) < —d{x, z){l+V{x)) , 

c c 



from which (4.12 1 follows with K = 2C/c. 

If d(x, z) < c, we make use of our assumptions ( 4.10[ ) and ( |4.1 1 1 to deduce that 

d(x, y)(l + V(x) + V(y)) < K d (d(x, z) + d(z, y))(l + V(x) + V(y)) 

< 2K d d(x, z){\ + V(x)) + 2CK d d(z, y)(l + V(z)) 

from which ( |4~12] i follows with K = 2K d (l V C). 



□ 



Remark 4.12 If X is a Banach space and d(x, y) = 1 A — y\\, then Lemma 4.11 essentially states 
that d satisfies (4.9 1 provided that V(x) grows at most exponentially with ||x||. 

The following result (which we already used in the previous section) relates the contraction property 
to the conditions in our main result on the convergence of transition probabilities. 

Proposition 4.13 Let (V t ) be a Feller Markov semigroup on X and assume that there exists a continuous 
metric d which generates the topology o/X and which is contracting for Vtfor some t > 0. Then the 
second condition in Theorem\2.4\is satisfied for the Markov kernel Vt- 



Before we turn to the proof of Proposition 4.13 we give the following result which is essential to 
settle the measurability questions arising in the proof: 
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Lemma 4.14 Let Q be a Feller Markov operator on a Polish space X and let d be a [0, l]-valued 
distance-like function on X x X which is contracting for Q. Then there exists a < 1 and a Markov 
operator TonXxX such that transition probabilities ofT are couplings of the transition probabilities 
for Q and such that the inequality 

(Td)(x,y) < ad(x,y) , 
holds for every (x, y) such that d(x, y) < 1. 

Proof. Denote as before by Ai(X x X) the set of probability measures on X x X endowed with the 
topology of weak convergence, so that it is again a Polish space. Let a £ (a, 1), where a is as in 
Definition |4. 5 1 For every (x, y) £ X x X, denote 

F(x, y) = {T£ C(Q(x, ■ ), Q(y, ■ )) : T(d) < ad(x, y)} , 

and denote by A the closure in X x X of the set {d(x, y) < 1}. We know that F(x, y) is non-empty 
by assumption whenever d(x, y) < 1. The Feller property of Q then ensures that this is also true for 
(x,y) £ A. 

The proof of the statement is complete as soon as we can show that there exists a measurable map 
T: X x X — > A4(X x X) such that T(x, y) £ F(x, y) for every x, y £ X, since it then suffices to set for 
example 

T(x,y) if(x,y)eA, 
Q(x, ■ ) (8 Q(y, • ) otherwise. 

Since the set F(x, y) is closed for every pair (x, y) by the continuity of d, this follows from the Kura- 
towski, Ryll-Nardzewski selection theorem [KR65 Wag77 1 provided we can show that, for every open 



T(x, V\-) = 



set U C M(X x X), the set F ~ 1 (U) = {(x, y) : F(x, y) n U ^ 4>} is measurable. 

Since on a Polish space every open set is a countable union of closed sets and since F _1 (C7 U V) — 
F _1 (?7) U (the same is not true in general for intersections!), the claim follows if we can show 

that F _1 (J7) is measurable for every closed set U. Under our assumptions, i 7 ' _1 (C/) actually turns out 
to be closed if U is closed. To see this, take a convergent sequence (x n , y n ) £ F^ 1 (U). The definition 
of F implies that there exist couplings T n £ C(Q(x n , ■), Q(y n , •)) with r n (d) < ad(x n ,y n ). Since 
Q is Feller, the sequence {r„} is tight, so that there exists a subsequence converging to a limit V. 
Since T belongs to C(Q(x, •), Q(y, •)) an d since, by the continuity of d, we have T(d) < ad(x,y), 
(x, y) £ F^ 1 (U) as claimed. □ 



Proof of Proposition \4.13\ By assumption, there exists some a £ (0, 1) such that d(Vt(x, .), Vt(y, ■)) < 
ad(x, y) for all x, y £ X which satisfy d(x, y) < 1. Consider the Markov operator Q := Vt- Let T be 
the Markov operator from Lemma [4.14| so that if, for fixed x, y £ X, we denote the corresponding chain 
starting at (x, y) by (X n , Y n ), then we have Ed(Xi, Y{) < ad(x, y) whenever d(x, y) < 1. Let T x y be 
the law of (X n , Y n ), n £ No- Now we define 

V n :=a- n d{X n ,Y n ) 

and t := inf{n £ N : d(X n ,Y n ) > 1}. Then V n/ \ T , n £ No is a non-negative supermartingale and 
therefore 

P{T < OO} < P{SU P VnAr > 1} < d(x, y) , 



n>0 



i.e. 



r x ,y{d(X n) Y n ) < a n for all n £ N } > 1 - d(x, y) 



This shows that the second assumption in Theorem 2.4 is satisfied for the chain associated to Q. □ 
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5 Application of the spectral gap result to SDDEs 

In this section, we apply the abstract results from the previous section to the problem of exponential 
convergence to an invariant measure for the type of stochastic delay equations considered earlier. The 
main problem will turn out to be to find a distance-like function d which is contracting. In order to 
obtain an exponential convergence result, we will have to assume, just like in the case of Harris chains 
|MT93 HM08b| some Lyapunov structure. We therefore introduce the following assumption: 

Assumption 5.1 There exists a continuous function V: C — ► R+ such that linoLi|xii-»oo ^(-^0 = +°° 
and such that there exist strictly positive constants Cy, 7 and Ky such that the bound 

EV(X t ) < C v e-^V(X ) + K v , 

holds for solutions to ( |i.7| ) with arbitrary initial conditions Xq G C. 

The distance-like function d that we are going to use in this section is given by 

d(X,Y) = lAS- 1 \\X -Y\\ , (5.1) 

for a suitable (small) constant S to be determined later. We start by verifying that bounded sets are 
rf-small for every value of 6 and we will then proceed to showing that under suitable assumptions, it is 
possible to find 5 > such that d is also contracting. 

5.1 Bounded sets are rf-small 



Proposition 5.2 Let the assumptions of Theorem 3.1 be satisfied, let d be as in (5.1 \ and let t > 2r be 
arbitrary. Then every bounded set is d-small for Vt- 

Proof. Fix t > 2r. We show that every closed ball Br C C with center and radius R is d-small for 



Vt- By Lemma 3.7 we know that p := inf xe ^ R Vt{x, B5/4) > 0. Let B# and let X and Y be 

solutions of pjj with initial conditions x and y respectively. We couple X and Y independently. Then 

d(v t (x, .),v t (y, .)) < e(i a (rix, - y t ||)) 

< P({X t i B s/4 } U {Y t i B 5/4 }) + ^P{A t e B 5/4 , Y t € B 5/4 } 

< 1 - -p 2 

for all x, y € B#, so the claim follows. □ 
5.2 The distance d is contracting 



Before we start, we give the following a priori estimate that shows that trajectories of (3.1 1 driven by 
the same realisation of the noise cannot separate too rapidly. More precisely, we have: 



Proposition 5.3 Let the assumptions of Theorem 3.1 be satisfied. There exists n > such that the 
bound 

mx t -x t \\ 4 < ^ i+t)2 \\x -x Q \\\ 

holds for allt>0 and any pair of initial conditions Xq, Xq G C. 
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Proof. The proof is similar to the argument used in the proof of Theorem 3.1 Setting Z(t) = X{t) — 
X(t), we have the bound 

d\Z(t)\ 2 = 2(f(X t ) - f(X t ), Z{t)) dt + \\g(X t ) - g(X t )f dt + dM(t) < K\\Z t \\ 2 dt + dM(t) , 

where M is a martingale with quadratic variation process bounded by C L ||Z s || 4 ds. Defining M*(t) — 
sup s<t M(s), we thus obtain the bound 

\\Z t f< \\Z \\ 2 + K f \\Z s fds + M*(t), 
Jo 

so that 



E||Z 4 || 4 < 3e(||Z || 4 + K 2 (£ ||Z s || 2 d S ) 2 + (M*(t)) 2 



'2j. / r||7 I|4j i rt I ry ||4, 



< 3( WZolr + KH / E||Z s || 4 ds + C* / E||Z s || 4 ds 
'o Jo 



where we used the Burkholder-Davis-Gundy inequality [RY91] in order to bound the expectation of 
(M*) 2 . The claim follows from Gronwall's lemma. □ 

In this subsection, we show one possible way of verifying that d is contracting that is suited to 
our problem. This is by far not the only one. One can check for example that the procedure followed 
in |HM08a| allows to construct a contracting distance for the degenerate 2D stochastic Navier-Stokes 
equations by using a gradient bound on the semigroup. A general version of this argument is presented 



in Section 5.3 below. For the problem at hand, it seems however more appropriate and technically 
straightforward to consider a "binding construction" in the terminology of |MY02 Hai02 Mat02b |. 

We fix two initial conditions Xq, Xq S C and consider the construction from Section^ We fix some 
7o > and choose A sufficiently large so that the conclusion of Lemma [33] holds . As in the proof of 
Theorem 3.1 we also introduce the stopping time r = inf{i > : J |w(s)| 2 ds > e" 1 !! JTo — Xo|| 2 }, 
where v is as in the proof of Theorem |3.1| (Note that the value of e is not necessarily that from Section[3] 
but will be determined later.) We also define v by v(s) = v(s)l T>s . 

This defines a map ^ from := C([0, oo), R m ) to itself by ^>(w) = w + J Q v(s) ds (the map * 
furthermore depends on the initial conditions Xq and Xo, but we suppress this dependence from the 
notation). The image P of Wiener measure P under has a density T>(w) — dP/dP. 

The aim of introducing the cutoff is that if we define T>(w) = 1 /T>(w), we obtain "for free" bounds 
of the type 



(1 - V(w)) 2 P(dw) < Ce- l \\X - X \\ 2 , / (1 - V(w)) 2 V{dw) < Ce^WXo - X, 



2 

for some constant C > 0, provided that we restrict ourselves to pairs of initial conditions such that 

ll^o -^oll 2 <£■ (5.2) 

Had we not introduced the cut-off, we would need to get exponential integrability of v first. 

The map allows to construct, for any two initial conditions X and X , a coupling for P with 
itself in the following way. Define the map ^ : il — > il x il by ^>(w) = (w, ^(w)), denote by ir l the 
projection onto the ith component of f2 x f2, and set 

n (dwi,dw 2 ) = (1 AD(uto))(##P)(dt0i,di02) , 
II(du;i, dw 2 ) = U (dw 1 ,dw 2 ) + Z-\P - 7 r # 1 n )(dw 1 )(P - 7r 2 n )(du; 2 ) , 
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where Z = 1 — IIo(f^ xO) = 4||P — P||tv is a suitable constant. One can 
check that II as defined above is a coupling for P and P. Furthermore, 
it is designed in such a way that it maximises the mass of the set f^o = 
{(w, w') : w' — ^{w)}. We claim that this coupling is designed in such 



a way that its image under the product solution map of (3.1 1 allows to 
verify that d as in ( |5.1| l is contracting for some sufficiently small value 
of S > to be determined later. Note that, since the bound (4.6 1 only 
needs to be ch ecke d for pairs of initial conditions with d(Xo, Xq) < 1, 
the constraint (5.2 1 is satisfied provided that we make sure that 6 2 < e. 

In order to see that this is indeed the case, we fix a terminal time t 
and we break the space x = f2i U f2 2 U into three parts: 




Sl x ={(«;, w') 

n 2 = {(w,w') 
n 3 = {(w,w') 



w' = ty(w) & r(iu) > t} 
w' = $f(w) & t(w) < t} 
w' ^ &(w)} . 



Here, we made use of the stopping time r defined at the beginning of this section. We consider the set Hi 
as being the set of "good" realisations and we will show that f2i has high probability. The contributions 
from the other two sets will be considered as error terms. 

Consider now a pair (Xq,Xo) of initial conditions s uch that d(Xo,Xo) < 1, which is to say that 
H-Xo — Xo|| < S. Denote by X t and X t the solutions to (3.1 1 driven by the noise realisations w and w' 
respectively. We then have 

d(X t (w),X t (w'))Il(dw,dw') < 5- 1 [ \\X t (w) - X t (v/)\\U(dw,dv/) 



\X t (w) - X t mw))\\ P(dw) < CS- 1 e-^WXo - X \ 



Ce-< ot d(X Q ,Xo) 



where we made use of the bounds obtained in Lemma 3.5 Regarding the integral over f2 2 , we combine 



Lemma 3.5 Proposition 5.3 and the strong Markov property to conclude that 



d(X t (w), X t (w')) U(dw, dw') <S^ \\X t (w) - X t C*(te))|| P(dw) 
n 2 Jr<t 



<5- l (j \\X t (w)-X t {^{w))\\ 2 Y(dw)) 1 2 ^/P( T <t) 

< r 1 E(Ce-'» T e' :(1+i - T)2 ) ||X - X \\y/P(T < t) 

< Ce K(1+t)2 d(X ,X )^/P(T<t) . 



At this stage, we combine Lemma 3.5 with Chebychev to conclude that 



P(T<i)<p(jT |«(s)| 2 da>e _1 ||Jf - ^o|| 2 ) < Ce , 
for some constant C independent of t and the pair (Xq, Xq). Finally, we obtain the bound 



d(X t (w), X t (w')) n(dw, dw) < n(0 3 ) = / (1 - 1 A D(w)) P(dw) 
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= J (0 V (1 - D{w))) P(dw) < ( J (1 - D(w)) 2 P(du>)) V2 
< Ce-^WXo - 1 || < C5e- l ' 2 d{X ,X Q ) . 

The required bound follows by first taking e small enough and then taking S small enough. 

5.3 Construction of contracting distances for SPDEs 

Finally, we want to show how the existence of a contracting distance for a Markov semigroup Vt can be 
verified in the case of stochastic PDEs. This is very similar to the calculation performed in [HM08a|, 
but it has the advantage of not being specific to the Navier-Stokes equations. Recall that |HM08c| yields 
conditions under which the Markov semigroup (over some separable Hilbert space Ti) generated by a 
class of stochastic PDEs satisfies the following gradient bound for every function cp 6 C 1 (TC, R): 

\\DVMX)\\ < W(X)^ t s J(V t \\Di P P)(X) + C\\i P \\ 00 ) . (5.3) 

Here, W : Ti — ► R+ is some continuous function that controls the regularising properties of Vt and 
7, C are some strictly positive constants. It turns out that if the semigroup Vt has sufficiently good 
dissipativity properties with respect to W, then one can find a contracting distance function for it. 
Before we state the result, let us define a family of "weighted metrics" g p on TL by 

Qp(X,Y)= inf / W*(7(i))||7(t)||di, 

where the infimum runs over all Lipschitz continuous paths 7 : [0,1] — ► TL connecting X to Y. With 
this notation at hand, we have: 



Proposition 5.4 Let {Vt}t>o be a Markov semigroup over a separable Hilbert space Ti satisfying the 
bound ( 5.3 \for some continuous function W : TL — > [1 , 00). Assume furthermore that there exists p > 1, 
a time t+ > and a constant C > such that the bound 

V t W 2p < CW 2p - 2 , (5.4) 

holds for every t > t ± . (In other words, W is a kind of super -Lyapunov function for Vt-) Then, there 
exists S > and T > such that the metric d{X, Y) = 1 A S^ 1 g p (X, Y) is contracting for Vt- 

Proof. By Monge-Kantorowitch duality, it is sufficient to show that there exist T > and 6 > such 
that the bound 

\V T <p(X) - V T <p(Y)\ < gp( *' r) , (5.5) 

holds for every C 1 function tp : H — * R which has Lipschitz constant 1 with respect to d. Note now that 
such a function <p satisfies 

1 W P (X) 

\<p(X)\ < - , Hu^JOII < . 



In particular, it follows from the gradient bound (5.3 1 combined with (5.4i that for T > t+, one has 

\\DP T <p(X)\\ < W(X)(5- 1 e-^ T ^ r dw' p - 1 (X)+ C ^ 
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Choosing T sufficiently large and 6 sufficiently small, we see that it is possible to ensure that 



\\DV T <P(X)\\ < 



W p (X) 



25 



Since, on the other hand, for any path 7 connecting X to Y we have 



\V T <p(X)-V T <p(Y)\ < 




Jo 



the requested bound ( 5.5 1 follows at once. 



□ 
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